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fvj Abstract. The network virtualization paradigm envisions an Internet where ar- 

bitrary virtual networks (VNets) can be specified and embedded over a shared 

O , substrate (e.g., the physical infrastructure). As VNets can be requested at short 

^^ notice and for a desired time period only, the paradigm enables a flexible service 

^ deployment and an efficient resource utilization. 

^"^ This paper investigates the security implications of such an architecture. We con- 

Cn sider a simple model where an attacker seeks to extract secret information about 

^_^ the substrate topology, by issuing repeated VNet embedding requests. We present 

KH a general framework that exploits basic properties of the VNet embedding rela- 

' /" tion to infer the entire topology. Our framework is based on a graph motif dic- 

• tionary applicable for various graph classes. We provide upper bounds on the 

fj request complexity, the number of requests needed by the attacker to succeed. 

I I Moreover, we present some experiments on existing networks to evaluate this 

dictionary-based approach. 

> 

ON 

Q^ 1 Introduction 

l> 

: While network virtualization enables a flexible resource sharing, opening the infras- 

^T tructure for automated virtual network (VNet) embeddings or service deployments may 

introduce new kinds of security threats. For example, by virtualizing its network infras- 
tructure (e.g., the links in the aggregation or backbone network, or the computational 
or storage resources at the points-of-presence), an Internet Service Provider (ISP) may 
lose control over how its network is used. Even if the ISP manages the allocation and 
S^ migration of VNet slices and services itself and only provides a very rudimentary in- 

^ terface to interact with customers (e.g., service or content providers), an attacker may 

^ infer information about the network topology (and state) by generating VNet requests. 

This paper builds upon the model introduced in ifTTl and studies complexity of the 
topology extraction problem: How many VNet requests are required to infer the full 
topology of the infrastructure network? While algorithms for trees and cactus graphs 
with request complexity 0{n) and a lower bound for general graphs of Q{n^) have 
been shown in [TTl, graph classes between these extremes have not been studied. 

Contribution. This paper presents a general framework to solve the topology ex- 
traction problem. We first describe necessary and sufficient conditions which facilitate 
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the "greedy" exploration of the substrate topology (the host graph H) by iteratively ex- 
tending the requested VNet graph (the guest graph G). Our framework then exploits 
these conditions to construct an ordered (request) dictionary defined over so-called 
graph motifs. We show how to apply the framework to different graph families, dis- 
cuss the implications on the request complexity, and also report on a small simulation 
study on realistic topologies. These empirical results show that many scenarios can in- 
deed be captured with a small dictionary, and small motifs are sufficient to infer if not 
the entire, then at least a significant fraction of the topology. 

2 Background 

This section presents our model and discusses how it compares to related work. 

Model. The VNet embedding based topology extraction problem has been intro- 
duced in [11 J. The formal setting consists of two entities: a customer (the "adversary") 
that issues virtual network (VNet) requests and ?l provider that performs the access con- 
trol and the embedding of VNets. We model the virtual network requests as simple, 
undirected graphs G — {V, E) (the guest graph) where V denotes the virtual nodes and 
E denotes the virtual edges connecting nodes in V . Similarly, the infrastructure network 
is given as an undirected graph H ~ (V, E) (the so-called host graph or substrate) as 
well, where V denotes the set of substrate nodes, E is the set of substrate links, and w is 
a capacity function describing the available resources on a given node or edge. Without 
loss of generality, we assume that H is connected and that there are no parallel edges 
or self-loops neither in VNet requests nor in the substrate. 

In this paper we assume that besides the resource demands, the VNet requests do 
not impose any mapping restrictions, i.e., a virtual node can be mapped to any sub- 
strate node, and we assume that a virtual link connecting two substrate nodes can be 
mapped to an entire (but single) path on the substrate as long as the demanded capacity 
is available. These assumptions are typical for virtual networks fS\. 

A virtual link which is mapped to more than one substrate link however can entail 
certain costs at the relay nodes, the substrate nodes which do not constitute endpoints of 
the virtual link and merely serve for forwarding. We model these costs with a parameter 
e > (per link). Moreover, we also allow multiple virtual nodes to be mapped to the 
same substrate node if the node capacity allows it; we assume that if two virtual nodes 
are mapped to the same substrate node, the cost of a virtual hnk between them is zero. 

Definition 1 (Embedding tt, Relation h^). An embedding of a graph A = 
{Va,Ea,wa) to a graph B — (Vb, Eb,wb) is a mapping t: : A ^ B where ev- 
ery node of A is mapped to exactly one node of B, and every edge of A is mapped 
to a path of B. That is, n consists of a node Try : Va — >■ Vb and an edge mapping 
tte '■ Ea — >■ Pb, where Pb denotes the set of paths. We will refer to the set of virtual 
nodes embedded on a node vb G Vb by iiy (vb); similarly, tt^ (cb) describes the 
set of virtual links passing through cb € Eb and tt^ {^b) describes the virtual links 
passing through vb G Vb with vb serving as a relay node. 

To be valid, the embedding n has to fulfill the following properties: (i) Each node 
Va S Va is mapped to exactly one node vb € Vb (but given sufficient capacities, 
Vb can host multiple nodes from Va). (ii) Links are mapped consistently, i.e., for two 



nodes va,v'j^ G Va, if ca = {^A, 'i'^} G Ea then ca is mapped to a single (possibly 
empty and undirected) path in B connecting nodes tt{va) and 7r(ti^). A link ca cannot 
be split into multiple paths. (Hi) The capacities of substrate nodes are not exceeded: 
\Ivb G Vb: X^uGTr^^fu ) '^l") + £ ' ki? ("^-b)! ^ w{vb)- (iv) The capacities in Eb are 
respected as well, i.e., Ve^ G Eb: X^eSTr^^fe ) ""^(g) ^ w(eB). 

If there exists such a valid embedding mapping it, we say that graph A can be 
embedded in B, denoted by A >—>■ B. Hence, i-^ denotes the VNet embedding relation. 

The provider has a flexible choice where to embed a VNet as long as a valid mapping 
is chosen. In order to design topology discovery algorithms, we exploit the following 
property of the embedding relation. 

Lemma 1. The embedding relation i~>- applied to any family Q of undirected graphs 
(short: (Q, i— >) j, forms a partially ordered set (a posetj. [Proof in Appendix] 

We are interested in algorithms that "guess" the target topology H (the host graph) 
among the set H of possible substrate topologies. Concretely, we assume that given a 
VNet request G (a guest graph), the substrate provider always responds with an honest 
(binary) reply R informing the customer whether the requested VNet G is embedded- 
able on the substrate H. Based on this reply, the attacker may then decide to ask the 
provider to embed the corresponding VNet G on H, or it may not embed it and continue 
asking for other VNets. Let Alg be an algorithm asking a series of requests Gi, . . . ,Gt 
to reveal H. The request complexity to infer the topology is measured in the number 
of requests t (in the worst case) until Alg issues a request Gt which is isomorphic 
to H and terminates (i.e., Alg knows that H — Gt and does not issue further requests). 

Related Work. Embedding VNets is an intensively studied problem and there ex- 
ists a large body of literature (e.g., II7I9I12I14II ). also on distributed computing ap- 
proaches 1 8 1 and online algorithms 03161 . Our work is orthogonal to this line of litera- 
ture in the sense that we assume that an (arbitrary and not necessarily resource-optimal) 
embedding algorithm is given. Instead, we focus on the question of how the feedback 
obtained through these algorithms can be exploited, and we study the implications on 
the information which can be obtained about a provider's infrastructure. 

Our work studies a new kind of topology inference problem. Traditionally, much 
graph discovery research has been conducted in the context of today's complex net- 
works such as the Internet which have fascinated scientists for many years, and there 
exists a wealth of results on the topic. The classic instrument to discover Internet topolo- 
gies is traceroute |4|, but the tool has several problems which makes the problem chal- 
lenging. One complication of traceroute stems from the fact that routers may appear as 
stars (i.e., anonymous nodes), which renders the accurate characterization of Internet 
topologies difficult 1 1 10 13 1. Network tomography is another important field of topol- 
ogy discovery. In network tomography, topologies are explored using pairwise end-to- 
end measurements, without the cooperation of nodes along these paths. This approach 
is quite flexible and applicable in various contexts, e.g., in social networks. For a good 
discussion of this approach as well as results for a routing model along shortest and 
second shortest paths see (2|. For example, |2| shows that for sparse random graphs, a 
relatively small number of cooperating participants is sufficient to discover a network 



fairly well. Both the traceroute and the network tomography problems differ from our 
virtual network topology discovery problem in that the exploration there is inherently 
path-based while we can ask for entire virtual graphs. 

The paper closest to ours is [llj . It introduces the topology extraction model studied 
in this paper, and presents an asymptotically optimal algorithm for the cactus graph 
family (request complexity 0{n)), as well as a general algorithm (based on spanning 
trees) with request complexity 0{ri^). 

3 Motif-Based Dictionary Framework 

The algorithms for tree and cactus graphs presented in 1 11 1 can be extended to a frame- 
work for the discovery of more general graph classes. It is based on the idea of growing 
sequences of subgraphs from nodes discovered so far Intuitively, in order to describe 
the "knitting" of a given part of a graph, it is often sufficient to use a small set of graph 
motifs, without specifying all the details of how many substrate nodes are required to 
realize the motif. We start this section with the introduction of motifs and their compo- 
sition and expansion. Then we present the dictionary concept, which structures motif 
sequences in a way that enables the efficient host graph discovery with algorithm DiCT. 
Subsequently, we give some examples and finally provide the formal analysis of the 
request complexity. 

3.1 Motifs: Composition and Expansion 

In order to define the motif set of a graph family H, we need the concept of chain 
(graph) C: C is just a graph G = {{vi,V2}, {wi, 112}) consisting of two nodes and a 
single link. As its edge represents a virtual link that may be embedded along entire path 
in the substrate network, it is called a chain. 

Definition 2 (Motif). Given a graph family %, the set of motifs of % is defined con- 
structively: If any member of H G % has an edge cut of size one, the chain C is a motif 
for %. All remaining motifs are at least 2-connected (i.e., any pair of nodes in a motif 
is connected by at least two vertex-disjoint paths). These motifs can be derived by the 
at least 2-connected components of any H Cz H by repeatedly removing all nodes with 
degree smaller or equal than two from H (such nodes do not contribute to the knitting) 
and merging the incident edges, as long as all remaining cycles do not contain parallel 
edges. Only one instance of isomorphic motifs is kept. 

Note that the set of motifs of Ti can also be computed by iteratively by removing 
all low-degree nodes and subsequently determine the graphs connecting nodes consti- 
tuting a vertex-cut of size one for each member H E T-L. In other words, the motif set 
A^ of a graph family "H is a set of non-isomorphic minimal (in terms of number of 
nodes) graphs that are required to construct each member H £ Hhy taking a motif and 
either replacing edges with two edges connected by a node or gluing together compo- 
nents several times. More formally, a graph family containing all elements of "H can be 
constructed by applying the following rules repeatedly. 



Definition 3 (Rules). (1) Create a new graph consisting of a motif AI E Ai (New 
Motif Rulej. (2) Given a graph created by these rules, replace an edge e of H by a 
new node and two new edges connecting the incident nodes ofe to the new node ("Insert 
Node Rulej. (3) Given two graphs created by these rules, attach them to each other 
such that they share exactly one node (Merge Rulej. 

Being the inverse operations of the ones to determine the motif set, these rules are 
sufficient to compose all graphs in "H: If A4 includes all motifs of H, it also includes 
all 2-connected components of H, according to Definitionl2] These motifs can be glued 
together using the Merge Rule, and eventually the low-degree nodes can be added using 
the Insert Node Rule. Therefore, we have the following lemma. 

Lemma 2. Given the motifs Ai of a graph family T-L, the repeated application of the 
rules in Definition^allows us to construct each member H E H. 

However, note that it may also be possible to use these rules to construct graphs that 
are not part of the family. The following lemma shows that when degree-two nodes are 
added to a motif M to form a graph G, all network elements (substrate nodes and links) 
are used when embedding M in G (i.e., M i~> G). 

Lemma 3. Let M g {M \ {C}) be an arbitrary two-connected motif, and let G be a 
graph obtained by applying the Insert Node Rule (Rule 2 of DefinitionHl to motif AI. 
Then, an embedding AI t-^ G involves all nodes and edges in G: at least e resources 
are used on all nodes and edges. 

Proof Let v E G. Clearly, if there exists u E AI such that v = tt{u), then w's capacity 
is used fully. Otherwise, v was added by Rule 2. Let a, b be the two nodes of G between 
which Rule 2 was applied, and hence {7r^^(a),7r^^(6)} E Em must be a motif edge. 
Observe that for these nodes' degrees it holds that deg(a) = deg(7r^^ (a)) and deg(6) = 
deg(7r^^(6)) since Rule 2 never modifies the degree of the old nodes in the host graph 
G. Since hnks are of unit capacity, each substrate link can only be used once: at a 
at most deg(a) edge-disjoint paths can originate, which yields a contradiction to the 
degree bound, and the relaying node v has a load of e. D 

Leinmal3]implies that no additional nodes can be inserted to an existing embedding. 
In other words, a motif constitutes a "minimal reservation pattern". As we will see, our 
algorithm will exploit this invariant that motifs cover the entire graph knitting, and adds 
simple nodes (of degree 2) only in a later phase. 

Corollary 1. Let AI E {Ai\{C}) and let G be a graph obtained by applying Rule 2 
of Definition n\ to motif AI. Then, no additional node can be embedded on G after 
embedding AI i— > G. 

Next, we want to combine motifs explore larger "knittings" of graphs. Each motif 
pair is glued together at a single node or edge ("attachment point"): We need to be able 
to conceptually join to motifs at edges as well because the corresponding edge of the 
motif can be expanded by the Insert Node Rule to create a node where the motifs can 
be joined. 



Definition 4 (Motif Sequences, Subsequences, Attacliment Points, -<). A motif se- 
quence S is a list S — {Miaia'iM2 ■ ■ ■ M^) where Vi : Mi E Ai and where 
Mi is glued together at exactly one node with Mi_i (i.e., Mi is "attached" to a 
node of motif Mi-i): the notation Mi-iai-ia'i_iMi specifies the selected attach- 
ment points ai-i and a^_]^. If the attachment points are irrelevant, we use the notation 
S = (M1M2 ■ . ■ Mk) and M^ denotes an arbitrary sequence consisting of k instances 
of Mi. If S can be decomposed into S — S1S2S3, where Si, S2 and S3, are (possi- 
bly empty) motif sequences as well, then Si, 6*2 and S3 are called subsequences of S, 
denoted by ^. 

In the following, we will sometimes use the Kleene star notation X* to denote a 
sequence of (zero or more) elements of X attached to each other 






Fig. 1. Left: Motif A. Center: Motif B. Observe that Ai/^ B. Right: Motif A is embedded into 
two consecutive Motifs B: solid lines are virtual links mapped on single substrate links, solid 
curves are virtual links mapped on multiple substrate links, dotted lines are substrate links imple- 
menting a multi-hop virtual link, and dashed lines are substrate unused links. Grayed nodes are 
relay-only nodes. Observe that the central node has a relaying load of 4e. 

One has to be careful when arguing about the embedding of motif sequences, as 
illustrated in FigurefTlwhich shows a counter example for Mi y^ Mj ^ Vfc > 0, Mi 0- 
M^. This means that we typically cannot just incrementally add motif occurrences to 
discover a certain substructure. This is the motivation for introducing the concept of a 
dictionary which imposes an order on motif sequences and their attachment points. 

3.2 Dictionary Structure and Existence 

In a nutshell, a dictionary is a Directed Acyclic Graph (DAG) defined over all possible 
motifs M.. and imposes an order (poset relationship 1-^) on problematic motif sequences 
which need to be embedded one before the other (e.g., the composition depicted in 
FigurefTl). To distinguish them from sequences, dictionary entries are called words. 

Definitions (Dictionary, Words). A dictionary D{Vd,Ed) is a directed acyclic 
graph (DAG) over a set of motif sequences Vd together with their attachment points. In 
the context of the dictionary, we will call a motif sequence word. The links Ed represent 
the poset embedding relationship f->. 

Concretely, the DAG has a single root r, namely the chain graph C (with two at- 
tachment points). In general, the attachment points of each vertex v € Vd describ- 
ing a word w define how w can be connected to other words. The directed edges 



Ed = (vi,i'2) represent the transitively reduced embedding poset relation with the 
chain C context: CviC is embeddable in CV2C and there is no other word Cv^C such 
that CviC I— >■ Cv^C, Cv^C 1— >■ CV2C and Cv^C y^ Cv^C holds. {The chains before 
and after the words are added to ensure that attachment points are "used" : there is no 
edge between two isomorphic words with different attachment point pairs.) 

We require that the dictionary be robust to composition; For any node v, let Ry = 
{v' G Vd , f I— )■ f '} denote the "reachable " set of words in the graph and Ry — Vd \ Ri 
all other words. We require that v y^ W,VM^ G Qi := R^\R\, where the transitive 
closure operator X* denotes an arbitrary sequence {including the empty sequence) of 
elements in X {according to their attachment points). 

See Figurel2]for an example. Informally, the robustness requirement means that the 
word represented by v cannot be embedded in any sequence of "smaller" words, unless 
a subsequence of this sequence is in the dictionary as well. As an example, in a dictio- 
nary containing motifs A and B from Figure [T] would contain vertices A, B and also 
BB, and a path from A to BB. In the following, we use the notation max^gy^ {v 1— > S) 




Fig. 2. a) Example dictionary with motifs Chain C, Cycle Y , Diamond D, complete bipartite 
graph B = ^('2,3 and complete graph K = Kf,. The attachment point pair of each word is black, 
the other nodes and edges of the words are grey. The edges of the dictionary are locally labeled, 
which is used in DiCT later b) A graph that can be constructed from the dictionary words. 

to denote the set of "maximal" vertices with respect to their embeddability into S: 
i £ ma.Xy(=VD{v ^ S) 4^ {i y-^ S) A (Vj G r+{i),j t/> S), where r+{v) denotes the 
set of outgoing neighbors of v. Furthermore, we say that a dictionary D covers a motif 
sequence S" iff S" can be formed by concatenating dictionary words (henceforth denoted 
by S' G D*) at the specified attachment points. More generally, a dictionary covers a 
graph, if it can be formed by merging sequences of D*. 

Let us now derive some properties of the dictionary which are crucial for a proper 
substrate topology discovery. First we consider maximal dictionary words which can 
serve as embedding "anchors" in our algorithm. 

Lemma 4. Let D be a dictionary covering a sequence S of motifs, and let i G 
niax^,gvb(w I— >■ S). Then i constitutes a subsequence of S, i.e., S can be decomposed 
to 81182, and 8 contains no words of order at most i, i.e., 81, 82 G {Ri U {«})*• 

Proof. By contradiction assume i G maxy^Vu (^ '~^ ^) ^^'^ * is not a subsequence of 8 
(written i yi 8). Since D covers 8 we have 8 GV^by definition. 



Since _D is a dictionary and i n- 5 we know that S <^ Qi. Thus, S G D*\Qi. 
S has a subsequence of at least one word in i?^. Thus there exists k E Ri such that 
k < S.\f k = i this implies i < S which contradicts our assumption. Otherwise it 
means that 3j e r^{i) such that j i^ k ^ S, which contradicts the definition of 
i £ max^gVo ("^ '^ ^) ^i^^ '^hus it must hold that i ^ S. D 

The following corollary is a direct consequence of the definition of i G 
max^jgvb (^ '^ "S") ^nd Lemma 4 since for a motif sequence S with S" € (i?i U {i})"^, 
all the subsequences of S that contain no i are in _Rj . As we will see, the corollary is 
useful to identify the motif words composing a graph sequence, from the most complex 
words to the least complex ones. 

Corollary 2. Let D be a dictionary covering a motif sequence S, and let i G 
max^gVD(w M> S). Then S can be decomposed as a sequence S = TiiT2i, . . . , iTk 
with Tj £ Qi,\/ i = 1, . . . ,k. 

This corollary can be applied recursively to describe a motif sequence as a sequence 
of dictionary entries. Note that a dictionary always exists. 

Lemma 5. There exists a dictionary D = {Vu, Ed) that covers all member graphs H 
of a motif graph family T-L with n vertices. [Proof in Appendix] 

3.3 The Dictionary Algorithm 

With these concepts in mind, we are ready to describe our generalized graph discovery 
algorithm called DiCT (cf Algorithm [T}. Basically, DiCT always grows a request graph 
G ^ H' until it is isomorphic to H (the graph to be discovered). This graph growing 
is performed according to the dictionary, i.e., we try to embed new motifs in the order 
imposed by the dictionary DAG. 

DiCT is based on the observation that it is very costly to discover additional edges 
between nodes in a 2-connected component: essentially, finding a single such edge re- 
quires testing all possibilities, which is quadratic in the component size. Thus, it is 
crucial to first explore the basic "knitting" of the topology, i.e., the minors which are at 
least 2-connected (the motifs). In other words, we maintain the invariant that there are 
never two nodes u, v which are not fc-connected in the currently requested graph H' 
while they are fc-connected in H; no path relevant for the connectivity is overlooked 
and needs to be found later 

Nodes and edges which are not contributing to the connectivity need not be explored 
at this stage yet, as they can be efficiently added later Concretely, these additional nodes 
can then be discovered by (1) using an edge expansion (where additional degree two 
nodes are added along a motif edge), and by (2) adding "chains" C to the nodes (a 
virtual link C constitutes an edge cut of size one and can again be expanded to entire 
chain of nodes using edge expansion). 

Let us specify the topological order in which algorithm DiCT discovers the dic- 
tionary words. First, for each node v in Vd, we define an order on its outgoing edges 
{{v,w)\w G r'^{v)}. This order is sometimes referred to as a "port labeling", and 
each path from the dictionary root (the chain C) to a node in Vd can be represented 



as the sequence of port labels at each traversed node (/i, I2, ■ ■ ■ , h), where li corre- 
sponds to a port number in C. We can simply use the lexicographic order on integers, 
<'': (ai,a2,...,a„J <'' (fei, 62, ■ • ■ , ^na) ^=^ ((3™ > 0) (V i < m){a, = 
h) A {am < bm)) V (Vi G {1, . . •?^l}, (a^ — bi) A (ni < 7^2)), to associate each 
vertex with its minimal sequence, and sort vertices of Vd according to their embedding 
order Let r be the rank function associating each vertex with its position in this sorting: 
r : Vd -^ {1, • • ■ \Vd\} (i.e., r is the topological ordering of I?). 

The fact that subsequences can be defined recursively using a dictionary (Lemmaffl 
and Corollary |2]) is exploited by algorithm DiCT. Concretely, we apply Corollary |2] 
to gradually identify the words composing a graph sequence, from the most complex 
words to the least complex ones. This is achieved by traversing the dictionary depth- 
first, starting from the root C up to a maximal node: algorithm DiCT tests the nodes 
of r^{v) in increasing port order as defined above. As a shorthand, the word v E Vd 
with r{v) = i is written as D[i]; similarly D[i] < D[j] holds if r{D[i]) < r{D[j]), a 
notation that will get useful to translate the fact that D[j] will be detected before D[i] by 
algorithm DiCT. As a consequence, the word of a sequence S that gets matched first is 
uniquely identified: it is i = arg maxa, (Z? [a;] 1-^ S) — ma,x{r{v)\v G max„/gy^(u' 1— >■ 
S*)}: z denotes the maximal word in S. 

Algorithm DiCT distinguishes whether the subsequences next to a word v E Vd are 
empty (0) or chains (C), and we will refer to the subsequence before w by Bf and to the 
subsequence after v by Af. Concretely, while recursively exploring a sequence between 
two already discovered parts r< and T> we check whether the maximal word v is 
directly next to T< (i.e., T< w, . . . , T>) or T> or both (0), or whether v is somewhere 
in the middle. In the latter case, we add a chain (C) to be able to find the greatest 
possible word in a next step. 

DiCT uses tuples of the form (i, j,Bf, Af) where i,j e N^ and (Bf, Af) e 
{0,C}^, i.e., D[i] denotes the maximal word in D, j is the number of consecutive 
occurrences of the corresponding word, and Bf and Af represent the words before and 
after D[i]. These tuples are lexicographically ordered by the total order relation > on 
the set of possible {i,j, Bf,Af) tuples defined as follows: let t — {i,j, Bf,Af) and 
t' = {i'J', Bf', Af') two such tuples. Then t > t' iff w > w' or w = w' A j > j' or 
w = w' A j = / A Bf = C A Bf' = or w = u;' A j = / A Bf = Bf' A Af = 
C A Af' = 0. 

With these definition we can prove that algorithm DiCT is correct. 

Theorem 1. Given a dictionary for H, algorithm DiCT correctly discovers any H £ H. 

Proof. We first prove that the claim is true if H forms a motif sequence (without edge 
expansion). Subsequently, we study the case where the motif sequence is expanded by 
Rule 2, and finally tackle the general composition case. 

Discovery of motif sequences: Due to Lemma |4] it holds that for w chosen when 
Line 1 of find_motif _sequence{) is executed for the first time, S is partitioned into 
three subsequences 5*1, w and 82- Subsequently find _motij _sequence{) is executed 
on each of the subsequences S" e {5*1, 82} recursively if C 1-^ 5', i.e., if the sub- 
sequences are not empty. Thus find^motif _sequence{) computes a decomposition as 
described in Corollary|2]recursively. As each of the words used in the decomposition is 



a subsequence of S and find^motif _sequence{) does not stop until no more words can 
be added to any subsequence, it holds that all nodes of S will be discovered eventually. 
In other words, tt^^{u) is defined for all u E S. 

As a next step we assume S" 7^ S* to be the sequence of words obtained by DiCT to 
derive a contradiction. Since S" := i7' is the output of algorithm DiCT and is hence em- 
beddable in H: S' H> S, there exists a valid embedding mapping tt. Given u, v £ V{S), 
we denote by iJ'^ (5") the set of pairs {u,w} for which {7r^^(u),7r~^(u)} e E{S'). 
Now assume that S and 5" do not lead to the same resource reservations "7r(5) ^ 
7r(S")". Hence there are some inconsistencies between the substrate and the output of 
algorithm DiCT: ^ = {{u,v} g E{S)\E'''\S') U E'''\S')\E{S)}. With each of 
these "conflict" edges, one can associate the corresponding word W^^v (resp. W^ ^) 
in S (resp. S'). If a given conflict edge spans multiple words, we only consider the 
words with the highest index as defined by DiCT. We also define i^.v = r{Wu.v) (resp. 
^u,t) = ^i^u.v))- Since S' and S are by definition not isomorphic, i^ „ 7^ iu,v 

Let j — max(„^)g<f (i„ I,) be the index of the greatest word embeddable on the 
substrate containing an inconsistency, and / be the index of the corresponding word 
detected by DiCT. 

(i) Assume j > j': a lower order motif was erroneously detected. Let J+ (and 
J^) be the set of dictionary entries that are detected before (after) D[j] (if any) in S 
by DiCT. Observe that the words in J+ were perfectly detected by DiCT, otherwise 
we are in Case (ii). We can decompose S as an alternating sequence of words of J+ 
and other words using Corollary pi: S = Ti Ji(ai)T2 . . . Tfe with Ji{ai) G {J^)* and 
attachment points a^ and Ti e [J^)* . As the words in J+ are the same in S' , we can 
write S' = T[JiT^ . . .T'^ (using Corollary |2] as well). 

Let T be the sequence among Ti , . . . , T^. that contains our misdetected word D [j] , 
and T' the corresponding sequence in 5". Observe that T' i-^ T since the words Ji cut 
the sequences of S and 5" into subsequences Ti, T/ that are embeddable. Observe that 
D[j] i-^ T since T contains it. Note that in the execution of find _motif_sequence{) 
when D[j'] was detected the higher indexed words had been detected correctly by 
DiCT in previous executions of this subroutine. Hence, r< and T> cannot contain any 
words leading to edges in ^. Thus (j', ., ., .) < {j, ., ., .) which contradicts Line 1 of 
find_motif_sequence{). 

(ii) Now assume / > j: a higher order motif was erroneously detected. Using the 
same decomposition as step (i), we define J'+ as the set of words perfectly detected, 
and therefore decompose S and S" as sequences S — TiJ[T2 . . . J'f._iTk and S" = 
T{J[Ti . . . Jk_iTl, with J^ e {J'+y and the property that each T^ h-> Ti. 

Let T' be the sequence among T[, . . . ^T'f. that contains our misdetected word D[i'], 
and T the corresponding sequence in S. Since D[i'] -< T', D[j'] 1— > T' . Thus, since 
T' t-^ T, we deduce D[j'] t-^ T which is a contradiction with j' and Corollary [2] 

The same arguments can be applied recursively to show that conflicts in (j) of smaller 
indices cannot exist either 

Expanded motif sequences. As a next step, we consider graphs that have been ex- 
tended by applying node insertions (Rule 2) to motif sequences, so called expanded mo- 
tif sequences: we prove that if H is an expanded motif sequence S, then algorithm DiCT 
correctly discovers S. Given an expanded motif sequence S, replacing all two degree 



nodes with an edge connecting their neighbors unless a cycle of length three would 
be destroyed, leads to a unique pure motif sequence T, T i-^ S. For the correspond- 
ing embedding mapping n it holds that V(S) \ tt{T) is exactly the set TZ of removed 
nodes. Applying find_motif _sequence{) to an expanded motif sequence discovers this 
pure motif sequence T by using the nodes in 7^ as relay nodes. All nodes in 7^ are 
then discovered in edge_expansion{) where the reverse operation node insertion is car- 
ried out as often as possible. It follows that each node in S is either discovered in 
find_motif _sequence() if it occurs in a motif or in edge_expansion() otherwise. 

Combining expanded sequences. Finally, it remains to combine the expanded 
sequences. Clearly, since motifs describe all parts of the graph which are at least 2- 
connected, the graph remaining after collapsing motifs cannot contain any cycles: it is 
a tree. However, on this graph DiCT behaves like Tree, but instead of attaching chains, 
entire sequences are attached to different nodes. Along the unique sequence paths be- 
tween two nodes, DiCT fixes the largest words first, and the claim follows by the same 
arguments as used in the proofs for tree and cactus graphs. D 



Algorithm 1 Motif Graph Discovery DiCT 

1: iy' :={{«}, 0} /*current request graph*/, V :— {v} /*set of unexplored nodes*/ 

2: while P / do 

3: choose v £'P,T := findjmottf _sequence{y , 0, 0) 

4: if (T / 0) then H' ■- H'vT, add all nodes of T to V, for all e G T do edgeExpansion(e) 

5: else remove « from P 

find_motif_sequence(v , r< , T> ) 
1: find maximal i, j, Bf, Af s.t. H'v (T<) Bf {D[i]y Af (T>)^ H where Bf, Af G {0, C}^ 

I* issue requests */ 
2: if ((i, j, Bf, Af) = (0, 0, C, 0)) then return T<Cr> 
3: if (Bf = C) then Bf = find.moUf _sequence{v,T<, {D[i\y Af T>) 
4: if (Af = C) then Af = find .motif .sequence {v,T< Bf {D[i]y ,Ty) 
5: return Bf {D[i]y Af 

edge-expansion{e) 
1: let u, V be the endpoints of edge e, remove e from H' 
2: find maximal j s.t. H'vC-'u \-^ H I* issue requests */ 
3: H' := H'vC-'u, add newly discovered nodes to V 

3.4 Request Complexity 

The focus of DiCT is on generality rather than performance, and indeed, the resulting 
request complexities can often be high. However, as we will see, there are interesting 
graph classes which can be solved efficiently. 

Let us start with a general complexity analysis. The requests issued by 
DiCT are constructed in Line 1 of finding.motif_sequence{) and in Line 2 of 
edge.expansion{). We will show that the request complexity of the latter is lin- 
ear in the number of edges of the host graph while the request complexity of 
f inding -motif sequence depends on the structure of the dictionary. Essentially, an 



efficient implementation of Line 1 of findingjmotif^sequence in DiCT can be seen 
as the depth-first exploration of the dictionary D starting from the chain C. More pre- 
cisely, at a dictionary word v requests are issued to see if one of the outgoing neighbors 
of V could be embedded at the position of v. As soon as one of the replies is positive, 
we follow the corresponding edge and continue recursively from there, until no outgo- 
ing neighbors can be embedded. Thus, the number of requests issued before we reach a 
vertex v can be determined easily. 

Recall that DiCT tests vertices of a dictionary D according to a fixed port la- 
beling scheme. For any v G Vjj, let p{C, v) be the set of paths from C to v 
(each path including C and v). In the worst case, discovering v costs cost(v) 
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Lemma 6. The request complexity of Line 1 offindjmotif_sequence{v', T^, T>) to 
find the maximal i,j,BF, Af such that H'v' (T<) Bf {D[i\y Af (T>) k^ H where 
Bf, Af £ {0, C}^ and H' is the current request graph is 0(niaxt,gvb cost(v) + j). 

Proof. To reach a word v = D[i] in Vd with depth-first traversal there is exactly 
one path between the chain C and v. DiCT issues a request for at most all the out- 
going neighbors of the nodes this path. After v has been found, the highest j where 
H'v (r<) Bf (v^) Af (r>) ^^ H has to be determined. To this end, another j + I 
requests are necessary. Thus the maximum of costiy) + j over all word v G Vjj deter- 
mines the request complexity. D 

When additional nodes are discovered by a positive reply to an embedding request, 
then the request complexity between this and the last previous positive reply can be 
amortized among the newly discovered nodes. Let numjnodes{v) denote the number 
of nodes in the motif sequence of the node v in the dictionary. 

Theorem 2. The request complexity of algorithm DiCT is at most 0{n- A-\- m), where 
m denotes the number of edges of the inferred graph H G T-L, and A is the maximal 
ratio between the cost of discovering a word v in D and num-nodes{v), i.e., A = 
max^gy^ {cost (u)/nu77i_no(ies(w)}. 

Proof Each time Line fl] of find-motif -sequence{) is called, either at least one new 
node is found or no other node can be embedded between the current sequences (one 
request is necessary for the latter result). If one or more new nodes are discovered, the 
request complexity can be amortized by the number of nodes found: If i; is the maximal 
word found in Line 1 of findjmotif_sequence{) then it is responsible for at most 
cost{v) requests due to Lemma [6] If it occurs more than once at this position, only 
one additional request is necessary to discover even more nodes (plus one superfluous 
request if no more occurrences of v can be embedded there). Amortizing the request 
number over the number of discovered nodes results in A requests. All other requests 
are due to edge-expansion{e) where additional nodes are placed along edges. Clearly, 
these costs can be amortized by the number of edges in H: for each edge e G E{H), at 
most two embedding requests are performed (including a "superfluous" request which 
is needed for termination when no additional nodes can be added). D 



3.5 Examples 

Let us consider concrete examples to provide some intuition for Theorem [T] and Theo- 
rem l2] The execution of DiCT for the graph in Figure l2]b), is illustrated in Figure [3] 




Fig. 3. Motif sequence tree of the graph in Figure |2]fe). The squares and the edges between them 
depict the motif composition, the shaded squares belong to the motif sequence YC^BDYD^ 
discovered in the first execution of find_motif_sequence{) (chains, cycles, diamonds, and the 
complete bipartite graph over two times three nodes are denoted by C, Y, D and B respectively). 
Subsequently, the found edges are expanded before calling findjnotif-aequencei) another 
four times to find Y and three times C . 

A fundamental graph class are trees. Since, the tree does not contain any 2- 
connected structures, it can be described by a single motif: the chain C. Indeed, if DiCT 
is executed with a dictionary consisting in the singleton motif set {C}, it is equivalent 
to a recursive version of Tree from Lllil and seeks to compute maximal paths. For 
the cactus graph, we have two motifs, the request complexity is the same as for the 
algorithm described in ifTTI . 

Corollary 3. Trees can be described by one motif (the chain C), and cactus graphs by 
two motifs (the chain C and the cycle Y). The request complexity o/DlCT on trees and 
cactus graphs is 0(n). 

Proof We present the arguments for cactus graphs only, as trees constitute a subset of 
the cactus family. The absence of diamond graph minors implies that a cactus graph 
does not contain two closed faces which share a link. Thus, there can exist at most two 
different (not even disjoint) paths between any node pair, and the corresponding motif 
subgraph forms a cycle Y (or a triangle). Since the cycle has only one attachment point 
pair, Zi of £> is constant. Consequently, a linear request complexity follows directly 
from Theoreml2]due to the planarity of cactus graphs (i.e., m e 0{n)). D 

An example where the dictionary is efficient although the connectivity of the topol- 
ogy can be high are block graphs. A block graph is an undirected graph in which every 
bi-connected component (a block) is a clique. A generalized block graph is a block 
graph where the edges of the cliques can contain additional nodes. In other words, in 
the terminology of our framework, the motifs of generalized block graphs are cliques. 
For instance, cactus graphs are generalized block graphs where the maximal clique size 
is three. 



Corollary 4. Generalized block graphs can be described by the motif set of cliques. The 
request complexity o/DlCT on generalized block graphs is 0{m), where m denotes the 
number of edges in the host graph. 



Proof. The framework dictionary for generalized block graphs consists of the set of 
cliques, as a clique with k nodes cannot be embedded on sequences of cliques with 
less than k nodes. As there are three attachment point pairs for each complete graph 
with four or more nodes, DiCT can be applied using a dictionary that contains three 



th 



entries for each motif with more than three nodes {numjnodes{) > 3). Thus, the i 
dictionary entry has [i/3j + 3 nodes for i > \ and cost{D[i]) < 3{i + 2) and Aof D 
is hence in 0(1). Consequently the complexity for generalized block graphs is 0{m) 
due to Theorem m D 



On the other hand. Theorem l2] also states that highly connected graphs may re- 
quire f2{n'^) requests, even if the dictionary is small. In the next section, we will study 
whether this happens in "real world graphs". 
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Fig. 4. Results of DiCT when run on different Internet and power grid topologies, a) Number of 
nodes in different autonomous systems (AS). We computed the set of motifs of these graphs as 
described in Definition|2]and counted the number of nodes that: (i) belong to a tree structure at the 
fringe of the network, (ii) have degree 2 and belong to two-connected motifs, and finally (iii) are 
part of the largest motif, b) The fraction of nodes that can be discovered with 12-motif dictionary 
represented in Figure c). d) An example network where tree nodes are colored yellow, line-nodes 
are green, attachment point nodes are red and the remaining nodes blue. 



4 Experiments 

To complement our theoretical results and to validate our framework on realistic graphs, 
we dissected the ISP topologies provided by the Rocketfuel mapping enginq^ In addi- 
tion, we also dissected the topology of a European electricity distribution grid (grid on 
the legends). FigureHla) provides some statistics about the aforementioned topologies. 
Since DiCT discovers both tree and degree 2 nodes in linear time, this figure shows 
that most of each topology can be discovered quickly. The inspected topologies are 
composed of a large bi-connected component (the largest motif), and some other small 
and simple motifs. Figure |4] b) represents the fraction of each topology that can be 
discovered by DiCT using only a 12-motifs dictionary (see Figure [4] c)). Interestingly, 
this small dictionary is efficient on 10 different topologies, and contains motifs that are 
mostly symmetrical. This might stem from the man-engineered origin of the targeted 
topologies. Finally, Figure |4]d) provides an example of such a topology. 
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A Appendix 

Lemma IT] The embedding relation i-^ applied to any family Q of undirected graphs 
(short: (Q, i-^)}, forms a partially ordered set (a posetj. 

Proof. A poset structure (5, <) over a set S requires that ^ is a (reflexive, transitive, 
and antisymmetric) order which may or may not be partial. To show that [Q , H>), the 
embedding order defined over a given set of graphs Q, is a poset, we examine the three 
properties in turn. 

Reflexive G € ^ h^ G G tj: By using the identity mapping tt : G — {V,E) ^f G = 
{V, E) which embeds each node and fink to itself, the claim is proved. 

Transitive A £ g ^ B £ Q wA B ^ g ^ C ^ Q implies A £ g ^ C £ g-. 
Let TTi denote the embedding function for A £ g ^^ B £ g and let -^2 denote the 
embedding function for: B £ g ^^ C £ g, which must exist by our assumptions. We 
will show that then also a valid embedding function tt exists to map A to G. Regarding 
the node mapping, we define ttv as Try := ttiv o 7r2i/, i.e., the result of first mapping 
the nodes according to TTiy and subsequently according to ii2V- We first show that 
■Ky is a valid mapping from A to G as well. First, Vw^i £ Va, t^{va) niaps va to a 
single node in Vc, fulfilling the first condition of the embedding (see Definition [T]|. 
Ignoring relay capacities (which is studied together with the conditions on the links 
below). Condition (ii) of Definition [T] is also fulfilled since the mapping ttiv ensures 
that no node in Vb exceeds its capacity, and can hence safely be mapped to Vc- Let 
us now turn our attention to the links. We use the following mapping tte for the edges. 
Note that ttie maps a single link e to an entire (but possibly empty) path in B and Tr2E 
then maps the corresponding links e' in i? to a walk in G. We can transform any of 
these walks into paths by removing cycles; this can only lower the resource costs. Since 
TTiE maps to a subset of Eb only and since Tr2E can embed all edges of B, all link 
capacities are respected up to relay costs. However, note also that by the mapping tti 
and for relay costs e > 0, each node vb £ Vb can either not be used at all, be fully 
used as a single endpoint of a link ea G Ea, or serve as a relay for one or more links. 
Since both end-nodes and relay nodes are mapped to separate nodes in G, capacities are 
respected as well. Conditions (Hi) and (iv) hold as well. 

Antisymmetric A £ g i—i' B £ g and B £ g i—i' A £ g implies A ~ B, i.e., A 
and B are isomorphic and have the same weights: First observe that if the two networks 
differ in size, i.e., \Va\ 7^ \Vb\ or \Ea\ 7^ \Eb\, then they cannot be embedded to 
each other: W.l.o.g., assume \Va\ > \Vb\, then since nodes of Va of cannot be split 
into multiple nodes of Vb (cf Definition [Til, there exists a node va £ Va to which no 
node from Vb is mapped. This however implies that node Tri{vA) £ Vb must have 
available capacities to host also va, contradicting our assumption that nodes cannot 
be split in the embedding. Similarly, if \Ea\ 7^ \Eb\, we can obtain a contradiction 
with the single path argument. Thus, not only the total number of nodes and links in 
A and B must be equivalent but also the total amount of node and link resources. So 
consider a valid embedding tti for A £ g 1-^ B £ g and a valid embedding tt2 for 
B £ g 1-^ A £ g, md assume \Va\ = \Vb\ and \Ea\ = \Eb\- It holds that tti and 
7r2 define an isomorphism between A and B: Clearly, since \Va\ — \Vb\, tti and 7r2 
define a permutation on the vertices. W.l.o.g., consider any link {va, v'a} £ Ea- Then, 



also {7ri(u^),7ri(u^)} e Eb'- \{Tri{vA),Tri{v[)}\ ~ would violate the node capacity 
constraints in B, and |{7ri(w^), 7ri(w^)}| > 1 requires \Eb\ > \Eji\. D 

Lemmapj There exists a dictionary D ~ (Vd, E^) that covers all member graphs 
H of a motif graph family % with n vertices. 

Proof We present a procedure to construct such a dictionary D. Let A4n be the set 
of all motifs with n nodes of the graph family H. For each motif m € Ain with x 
possible attachment point pairs (up to isomorphisms), we add x dictionary words to 
Vd, one for each attachment point pair The resulting set is denoted by Vm. For each 
sequence of VJ^ with at most n nodes, we add another word to Vq (with the un-used 
attachment points of the first and the last subword). There is an edge e G Ed if the 
transitive reduction of the embedding relation with context includes an edge between 
two words. We now prove that Z? is a dictionary, i.e., it is robust to composition. Let 
i £ Vd- Observe that Ri contains all compositions of words with at most n nodes in 
which i can be embedded. Consequently, no matter which sequences are in R^ it holds 
that Vi cannot be embedded in a sequences in Qi the robustness condition is satisfied. 
Since H has n vertices, and since D contains all possible motifs of at most n vertices, 
D covers H. D 

Note that the proof of Lemma |5] only addresses the composition robustness for se- 
quences of up to n nodes. However, it is clear that |V(G')| > |V^(i?)| ^ G ^/^ H, 
and therefore no "mismatch" can happen to happen. Finite dictionaries and with this 
adapted composition can also be applied in the lemmata proved above, there is only 
a small notational change necessary in the proof of Lemma p] (Note that it is always 
possible to determine the number of nodes n by binary search using 0(log n) requests.) 



